home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
ftp.cs.arizona.edu
/
ftp.cs.arizona.edu.tar
/
ftp.cs.arizona.edu
/
icon
/
newsgrp
/
group98a.txt
/
000136_icon-group-sender _Mon Mar 16 08:02:35 1998.msg
< prev
next >
Wrap
Internet Message Format
|
2000-09-20
|
4KB
Return-Path: <icon-group-sender>
Received: from kingfisher.CS.Arizona.EDU (kingfisher.CS.Arizona.EDU [192.12.69.239])
by baskerville.CS.Arizona.EDU (8.8.7/8.8.7) with SMTP id IAA19541
for <icon-group-addresses@baskerville.CS.Arizona.EDU>; Mon, 16 Mar 1998 08:02:35 -0700 (MST)
Received: by kingfisher.CS.Arizona.EDU (5.65v4.0/1.1.8.2/08Nov94-0446PM)
id AA17496; Mon, 16 Mar 1998 08:02:34 -0700
From: gep2@computek.net
Date: Fri, 13 Mar 1998 21:54:29 -0600
Message-Id: <199803140354.VAA03466@axp.cmpu.net>
Mime-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
Subject: Re: Letter Probabilities
To: icon-group@optima.CS.Arizona.EDU
X-Mailer: SPRY Mail Version: 04.00.06.17
Errors-To: icon-group-errors@optima.CS.Arizona.EDU
Status: RO
Content-Length: 3512
> Several people have simultaneously suggested the generator string idea.
This is hardly surprising. :-)
> The probability table is simply a requirement for output. As long as
I'm going to compute it anyway, it's useful.
Fine, but that doesn't mean you have to use it for generating your random text.
> In English, the space character is always first, followed by lower case
'e' with probability about 0.10. Some results are counterintuitive,
such as 'y' happening 50% more often than 'b' in the sample below
(computed from a small portion of "Moby Dick").
Yeah, I think several things about your table are quite suspect. Perhaps you
ought to use a more modern text for computing your probabilities.
> I have only been at Icon for a few weeks and think I have a firm grasp
of it.
I think that sounds like an "oxymoron" to me. I don't think you'll REALLY grasp
it in just a few weeks, frankly.
> Whether it is ideal for this problem or not, I would like to know
whether Icon has some elegant mechanism for scanning such an ordered
list.
I think that elegance comes from solving a given problem in the most effective,
efficient, and simple way... not by making an elegant coding of an undesirable
algorithm.
>[letter frequencies]
" "<--->0.1751922190691018
"e"<--->0.09672803124014646
"t"<--->0.07254602343010987
"o"<--->0.06209221664362462
"a"<--->0.06182541415023404
"s"<--->0.05256009119794318
"n"<--->0.05175968371777146
"i"<--->0.0484610347085789
"h"<--->0.04632661476145431
"r"<--->0.04501685706662785
"l"<--->0.0317980062577312
"d"<--->0.03043973901865191
"u"<--->0.02127143515486672
"m"<--->0.01979189405515535
"g"<--->0.01763321933590433
"c"<--->0.01717237866550243
"f"<--->0.01707535957699677
"w"<--->0.01554730893303257
"y"<--->0.01554730893303257
"p"<--->0.0151349778068835
","<--->0.0151349778068835
"\n"<--->0.010089985204589
"b"<--->0.009968711343956922
"v"<--->0.007397705498556839
"."<--->0.006306240752868125
"-"<--->0.00616071212010963
"k"<--->0.005821145310339808
"I"<--->0.004826699653156758
";"<--->0.001843362681607606
"T"<--->0.001503795871837784
"?"<--->0.00140677678333212
"B"<--->0.001309757694826457
"W"<--->0.001309757694826457
"S"<--->0.001091464745688714
"N"<--->0.001042955201435882
"A"<--->0.0009701908850566347
"C"<--->0.000921681340803803
"x"<--->0.0008974265686773871
"z"<--->0.0008246622522981394
"j"<--->0.0007033883916660602
"q"<--->0.0006548788474132285
"!"<--->0.0006306240752868126
"P"<--->0.0006306240752868126
"'"<--->0.0006063693031603967
"H"<--->0.0005093502146547332
"F"<--->0.0004850954425283173
"L"<--->0.0004365858982754856
"M"<--->0.0004123311261490697
"D"<--->0.000363821581896238
"E"<--->0.0003395668097698222
"G"<--->0.0003153120376434063
"R"<--->0.0002910572655169904
"Y"<--->0.0001940381770113269
"O"<--->0.0001455286327584952
")"<--->9.701908850566347e-5
":"<--->9.701908850566347e-5
"("<--->9.701908850566347e-5
"J"<--->9.701908850566347e-5
"V"<--->4.850954425283173e-5
"U"<--->4.850954425283173e-5
"Q"<--->4.850954425283173e-5
To start with, I'll comment that it's ludicrous to present such "probabilities"
to 16 significant digits... at least half (probably two-thirds or more) of those
digits are totally, absolutely meaningless.
I will comment however that it's probable that I could, given your table, even
tell you with pretty good certainty which about 20K piece of Moby Dick you
started with. :-)
Gordon Peterson
http://www.computek.net/public/gep2/
Support the Anti-SPAM Amendment! Join at http://www.cauce.org/